Although DETR-based 3D detectors can simplify the detection pipeline and achieve direct sparse predictions, their performance still lags behind dense detectors with post-processing for 3D object detection from point clouds. DETRs usually adopt a larger number of queries than GTs (e.g., 300 queries v.s. 40 objects in Waymo) in a scene, which inevitably incur many false positives during inference. In this paper, we propose a simple yet effective sparse 3D detector, named Query Contrast Voxel-DETR (ConQueR), to eliminate the challenging false positives, and achieve more accurate and sparser predictions. We observe that most false positives are highly overlapping in local regions, caused by the lack of explicit supervision to discriminate locally similar queries. We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions. This is achieved by the construction of positive and negative GT-query pairs for each GT, and a contrastive loss to enhance positive GT-query pairs against negative ones based on feature similarities. ConQueR closes the gap of sparse and dense 3D detectors, and reduces up to ~60% false positives. Our single-frame ConQueR achieves new state-of-the-art (sota) 71.6 mAPH/L2 on the challenging Waymo Open Dataset validation set, outperforming previous sota methods (e.g., PV-RCNN++) by over 2.0 mAPH/L2.
translated by 谷歌翻译
人脸图像通常以广泛的视觉量表出现。现有的面部表示通过组装有限系列的预定尺度的多尺度方案来追求处理量表变化的带宽。这种多弹药方案带来了推理负担,而预定义的量表不可避免地从真实数据中差异。取而代之的是,从数据中学习比例参数,并将其用于单发功能推理是一个不错的解决方案。为此,我们通过诉诸规模空间理论并实现两倍的设施来改革Conv层:1)Conv层从真实数据分布中学习一组尺度,每个数据分布都由Conv内核来实现; 2)该图层自动在适当的通道和位置上突出显示与输入模式量表及其存在相对应的位置。然后,我们通过堆叠改革层的层来实现分层尺度的关注,建立一种名为“比例尺注意Cons Neurnet网络”(\ textbf {scan-cnn})的新颖风格。我们将扫描CNN应用于面部识别任务,并推动SOTA性能的前沿。当面部图像模糊时,准确性增长更为明显。同时,作为单发方案,该推断比多弹性融合更有效。与普通CNN相比,制造了一组工具,以确保对扫描CNN进行快速训练和推理成本的零增加。
translated by 谷歌翻译
弹药废料检查是回收弹药金属废料的过程中的重要步骤。大多数弹药由许多组件组成,包括盒子,底漆,粉末和弹丸。包含能量学的弹药废料被认为是潜在危险的,应在回收过程之前分离。手动检查每片废料都是乏味且耗时的。我们已经收集了一个弹药组件的数据集,目的是应用人工智能自动对安全和不安全的废料进行分类。首先,通过弹药的视觉和X射线图像手动创建两个培训数据集。其次,使用直方图均衡,平均,锐化,功率定律和高斯模糊的空间变换来增强X射线数据集,以补偿缺乏足够的训练数据。最后,应用代表性的Yolov4对象检测方法用于检测弹药组件并分别将废料片分别为安全和不安全的类。训练有素的模型针对看不见的数据进行了测试,以评估应用方法的性能。实验证明了使用深度学习的弹药组件检测和分类的可行性。数据集和预培训模型可在https://github.com/hadi-ghnd/scrap-classification上获得。
translated by 谷歌翻译
自动检测交通事故是交通监控系统中重要的新兴主题。如今,许多城市交叉路口都配备了与交通管理系统相关的监视摄像机。因此,计算机视觉技术可以是自动事故检测的可行工具。本文提出了一个新的高效框架,用于在交通监视应用的交叉点上进行事故检测。所提出的框架由三个层次步骤组成,包括基于最先进的Yolov4方法的有效和准确的对象检测,基于Kalman滤波器与匈牙利算法进行关联的对象跟踪以及通过轨迹冲突分析进行的事故检测。对象关联应用了新的成本函数,以适应对象跟踪步骤中的遮挡,重叠对象和形状变化。为了检测不同类型的轨迹冲突,包括车辆到车辆,车辆对乘车和车辆对自行车,对物体轨迹进行了分析。使用真实交通视频数据的实验结果显示,该方法在交通监视的实时应用中的可行性。尤其是,轨迹冲突,包括在城市十字路口发生的近乎事故和事故,以低的错误警报率和高检测率检测到。使用从YouTube收集的具有不同照明条件的视频序列评估所提出框架的鲁棒性。该数据集可在以下网址公开获取:http://github.com/hadi-ghnd/accidentdetection。
translated by 谷歌翻译
变异量子算法(VQA)在NISQ时代表现出巨大的潜力。在VQA的工作流程中,Ansatz的参数迭代更新以近似所需的量子状态。我们已经看到了各种努力,以较少的大门起草更好的安萨兹。在量子计算机中,栅极Ansatz最终将转换为控制信号,例如TransMons上的微波脉冲。并且对照脉冲需要精心校准,以最大程度地减少误差(例如过度旋转和旋转)。在VQA的情况下,此过程将引入冗余,但是VQAS的变异性能自然可以通过更新幅度和频率参数来处理过度旋转和重组的问题。因此,我们提出了PAN,这是一种用于VQA的天然脉冲ANSATZ GENTARATOR框架。我们生成具有可训练参数用于振幅和频率的天然脉冲ansatz。在我们提出的锅中,我们正在调整参数脉冲,这些脉冲在NISQ计算机上得到了内在支持。考虑到本机 - 脉冲ANSATZ不符合参数迁移规则,我们需要部署非级别优化器。为了限制发送到优化器的参数数量,我们采用了一种生成本机 - 脉冲ANSATZ的渐进式方式。实验是在模拟器和量子设备上进行的,以验证我们的方法。当在NISQ机器上采用时,PAN获得的延迟平均提高了86%。 PAN在H2和HEH+上的VQE任务分别能够达到99.336%和96.482%的精度,即使NISQ机器中有很大的噪声。
translated by 谷歌翻译
准确可靠的3D检测对于包括自动驾驶车辆和服务机器人在内的许多应用至关重要。在本文中,我们提出了一个具有点云序列的3D时间对象检测的灵活且高性能的3D检测框架,称为MPPNET。我们提出了一个新颖的三级结构框架,其中包含多帧特征编码和相互作用的代理点,以实现更好的检测。这三个层次结构分别进行每个帧的特征编码,短片特征融合和整个序列特征聚合。为了使用合理的计算资源来处理长期序列云,提出了组内特征混合和组间特征的注意,以形成第二和第三个特征编码层次结构,这些层次结构均经常应用于聚集多框架轨迹特征。代理不仅可以充当每个帧的一致对象表示,而且还充当了方便框架之间特征交互的快递。大型Waymo打开数据集的实验表明,当应用于短(例如4框架)和长(例如16框架)点云序列时,我们的方法优于具有较大边缘的最先进方法。代码可在https://github.com/open-mmlab/openpcdet上找到。
translated by 谷歌翻译
人脸识别是计算机视觉中最受欢迎和最长的主题之一。随着最近的深度学习技术和大规模数据集的发展,深刻的面貌识别取得了显着的进展,并广泛用于许多现实世界应用。给定自然图像或视频帧作为输入,端到端的深脸识别系统输出面部特征以识别。为此,典型的端到端系统通常具有三个关键元素:面部检测,面部对准和面部表示。面部检测定位在图像或框架中的面部。然后,继续进行面对准以将面部校准到规范视图并将它们裁剪到归一化的像素尺寸。最后,在面部表示的阶段,从对准的面部提取歧视特征以进行识别。如今,所有三个要素都是通过深度卷积神经网络的技术实现的。在本调查文章中,我们对最近的端到端深刻识别的每个元素的进步进行了全面的审查,因为蓬勃发展学习技巧极大地提高了它们的能力。首先,我们概述了端到端深表识的概述。然后,我们分别审查每个元素的前进,涵盖许多方面,例如迄今的算法设计,评估指标,数据集,性能比较,对未来研究的有希望和有希望的方向。通过这项调查,我们希望在两个方面提出贡献:首先,读者可以方便地识别子类别中具有相当强大的基础风格的方法,以进一步探索;其次,人们还可以采用合适的方法来从划痕建立最先进的端到端面部识别系统。
translated by 谷歌翻译
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-theart results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10× more layers. The source code for the complete system are publicly available 1 .
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译